Matrix factorization-based multi-objective ranking–What makes a good university?

Non-negative matrix factorization (NMF) efficiently reduces high dimensionality for many-objective ranking problems. In multi-objective optimization, as long as only three or four conflicting viewpoints are present, an optimal solution can be determined by finding the Pareto front. When the number of the objectives increases, the multi-objective problem evolves into a many-objective optimization task, where the Pareto front becomes oversaturated. The key idea is that NMF aggregates the objectives so that the Pareto front can be applied, while the Sum of Ranking Differences (SRD) method selects the objectives that have a detrimental effect on the aggregation, and validates the findings. The applicability of the method is illustrated by the ranking of 1176 universities based on 46 variables of the CWTS Leiden Ranking 2020 database. The performance of NMF is compared to principal component analysis (PCA) and sparse non-negative matrix factorization-based solutions. The results illustrate that PCA incorporates negatively correlated objectives into the same principal component. On the contrary, NMF only allows non-negative correlations, which enable the proper use of the Pareto front. With the combination of NMF and SRD, a non-biased ranking of the universities based on 46 criteria is established, where Harvard, Rockefeller and Stanford Universities are determined as the first three. To evaluate the ranking capabilities of the methods, measures based on Relative Entropy (RE) and Hypervolume (HV) are proposed. The results confirm that the sparse NMF method provides the most informative ranking. The results highlight that academic excellence can be improved by decreasing the proportion of unknown open-access publications and short distance collaborations. The proportion of gender indicators barely correlate with scientific impact. More authors, long-distance collaborations, publications that have more scientific impact and citations on average highly influence the university ranking in a positive direction.


S4 Appendix: CWTS Leiden Ranking 2020 database variables
The CWTS Leiden database consists of four major areas: scientific (S), collaboration (C), gender (G) and open-access (O). The areas have several indicators that are referenced with an abbreviated code throughout the article. Thus, a list is presented to name and describe the abbreviations. The first letter is a syllable and denotes the field, while the remaining letters propose the variable, e.g., S7 is the seventh indicators of the scientific discipline -The number of publications that belong to the top 50% most frequently cited ones. The indicators are the following: S1 -The total number of publications of a university with a scientific impact.
S2, S8 -The total and average numbers of citations of the publications of a university.
S3, S9 -The total and average numbers of citations of the publications of a university, normalized in terms of field and year of publication.
S4, S10 -The number and proportion of a university's publications that, when compared with other publications in the same field and published in the same year, belong to the top 1% most frequently cited ones.
S5, S11 -As described above that belong to the top 5% most frequently cited ones.
S6, S12 -As described above that belong to the top 10% most frequently cited ones.
S7, S13 -As described above that belong to the top 50% most frequently cited ones.
C1 -The total number of publications of a university in collaboration with other universities.
C2, C7 -The number and proportion of a university's publications that have been co-authored with one or more other organizations.
C3, C8 -The number and proportion of a university's publications that two or more countries have co-authored March 28, 2023 1/2 C4, C9 -The number and proportion of a university's publications that have been co-authored with one or more industrial organizations. All private sector for-profit business enterprises, covering all manufacturing and services sectors, are regarded as industrial organizations.
C5, C10 -The number and proportion of a university's publications with a geographical collaboration distance of less than 100 km.
C6, C11 -The number and proportion of a university's publications with a geographical collaboration distance of more than 5000 km.
G1 -The total number of authorships of a university. For instance, a publication with five authors, of which three and two are reported to be affiliated with university 1 and 2, respectively.
G2 -The number of male and female authorships of a university, that is, a university's number of authorships for which the gender is known.
G3, G6 -The number and proportion (with regard to G1) of authorships of a university for which the gender is unknown.
G4, G7 -The number and proportion (with regard to G1) of male authorships of a university.
G5, G8 -The number and proportion (with regard to G1) of female authorships of a university.
G6 -The number of authorships for which the gender is unknown as a proportion of a university's total number of authorships.
G9 -The number of male authorships as a proportion (with regard to G2) of a university's number of known male and female authorship.
G10 -The number of female authorships as a proportion (with regard to G2) of a university's number of known male and female authorship.